Goto

Collaborating Authors

 cloud model


ECCENTRIC: Edge-Cloud Collaboration Framework for Distributed Inference Using Knowledge Adaptation

Kamani, Mohammad Mahdi, Cheng, Zhongwei, Chen, Lin

arXiv.org Artificial Intelligence

The massive growth in the utilization of edge AI has made the applications of machine learning models ubiquitous in different domains. Despite the computation and communication efficiency of these systems, due to limited computation resources on edge devices, relying on more computationally rich systems on the cloud side is inevitable in most cases. Cloud inference systems can achieve the best performance while the computation and communication cost is dramatically increasing by the expansion of a number of edge devices relying on these systems. Hence, there is a trade-off between the computation, communication, and performance of these systems. In this paper, we propose a novel framework, dubbed as Eccentric that learns models with different levels of trade-offs between these conflicting objectives. This framework, based on an adaptation of knowledge from the edge model to the cloud one, reduces the computation and communication costs of the system during inference while achieving the best performance possible. The Eccentric framework can be considered as a new form of compression method suited for edge-cloud inference systems to reduce both computation and communication costs. Empirical studies on classification and object detection tasks corroborate the efficacy of this framework.


LightAgent: Mobile Agentic Foundation Models

Jiang, Yangqin, Huang, Chao

arXiv.org Artificial Intelligence

With the advancement of multimodal large language models (MLLMs), building GUI agent systems has become an increasingly promising direction-especially for mobile platforms, given their rich app ecosystems and intuitive touch interactions. Yet mobile GUI agents face a critical dilemma: truly on-device models (4B or smaller) lack sufficient performance, while capable models (starting from 7B) are either too large for mobile deployment or prohibitively costly (e.g., cloud-only closed-source MLLMs). To resolve this, we propose LightAgent, a mobile agentic foundation model solution that leverages device-cloud collaboration to tap the cost-efficiency of on-device models and the high capability of cloud models, while avoiding their drawbacks. Specifically, LightAgent enhances Qwen2.5-VL-3B via two-stage SFT->GRPO training on synthetic GUI data for strong decision-making, integrates an efficient long-reasoning mechanism to utilize historical interactions under tight resources, and defaults to on-device execution-only escalating challenging subtasks to the cloud via real-time complexity assessment. Experiments on the online AndroidLab benchmark and diverse apps show LightAgent matches or nears larger models, with a significant reduction in cloud costs.


Reliable Inference in Edge-Cloud Model Cascades via Conformal Alignment

Huang, Jiayi, Park, Sangwoo, Paoletti, Nicola, Simeone, Osvaldo

arXiv.org Machine Learning

Edge intelligence enables low-latency inference via compact on-device models, but assuring reliability remains challenging. We study edge-cloud cascades that must preserve conditional coverage: whenever the edge returns a prediction set, it should contain the true label with a user-specified probability, as if produced by the cloud model. We formalize conditional coverage with respect to the cloud predictive distribution, and introduce a conformal alignment-based (CAb) cascading mechanism that certifies this property with user control over the risk level. Our method casts escalation from edge to cloud models as a multiple-hypothesis testing (MHT) problem, tailoring conformal alignment (CA) to select which inputs can be safely handled at the edge. The proposed CAb model cascading method yields statistical guarantees on the average fraction of edge decisions that satisfy cloud-level conditional coverage. The procedure applies to arbitrary edge prediction sets, including variants of conformal prediction (CP), and exposes a tunable trade-off among coverage, deferral rate, and set size. Experiments on CIFAR-100 image classification and the TeleQnA question-answering (QA) benchmark show that the proposed CAb cascade maintains the target conditional coverage for edge predictions while substantially reducing offloading to the cloud and incurring modest increases in prediction-set size.


Cloud Model Characteristic Function Auto-Encoder: Integrating Cloud Model Theory with MMD Regularization for Enhanced Generative Modeling

Hu, Biao, Wang, Guoyin

arXiv.org Artificial Intelligence

We introduce Cloud Model Characteristic Function Auto-Encoder (CMCFAE), a novel generative model that integrates the cloud model into the Wasserstein Auto-Encoder (WAE) framework. By leveraging the characteristic functions of the cloud model to regularize the latent space, our approach enables more accurate modeling of complex data distributions. Unlike conventional methods that rely on a standard Gaussian prior and traditional divergence measures, our method employs a cloud model prior, providing a more flexible and realistic representation of the latent space, thus mitigating the homogenization observed in reconstructed samples. We derive the characteristic function of the cloud model and propose a corresponding regularizer within the WAE framework. Extensive quantitative and qualitative evaluations on MNIST, FashionMNIST, CIFAR-10, and CelebA demonstrate that CMCFAE outperforms existing models in terms of reconstruction quality, latent space structuring, and sample diversity. This work not only establishes a novel integration of cloud model theory with MMD-based regularization but also offers a promising new perspective for enhancing autoencoder-based generative models.


High-Quality Pseudo-Label Generation Based on Visual Prompt Assisted Cloud Model Update

Xu, Xinrun, Zhang, Qiuhong, Yang, Jianwen, Lian, Zhanbiao, Yan, Jin, Ding, Zhiming, Jiang, Shan

arXiv.org Artificial Intelligence

--Generating high-quality pseudo-labels on the cloud side is crucial for cloud-edge collaborative object detection, especially in dynamic traffic monitoring scenarios where the target data distribution continuously evolves. Existing methods often assume a perfectly reliable cloud model, neglecting the potential for errors in the cloud's predictions, or employ simple adaptation techniques that struggle to handle complex distribution shifts. This paper proposes a novel Cloud-Adaptive High-Quality Pseudo-label generation algorithm (CA-HQP) that addresses these limitations by incorporating a learnable Visual Prompt Generator (VPG) and a dual feature alignment strategy into the cloud model updating process. The VPG enables parameter-efficient adaptation of the large pre-trained cloud model by injecting task-specific visual prompts into the model's input, enhancing its flexibility without extensive fine-tuning. T o mitigate domain discrepancies, CA-HQP introduces two complementary feature alignment techniques: a global Domain Query Feature Alignment (DQF A) that captures scene-level distribution shifts and a fine-grained T emporal Instance-A ware Feature Embedding Alignment (TIAF A) that addresses instance-level variations. Extensive experiments on the Bellevue traffic dataset, a challenging real-world traffic monitoring dataset, demonstrate that CA-HQP significantly improves the quality of pseudo-labels compared to existing state-of-the-art cloud-edge collaborative object detection methods. Further ablation studies validate the contribution of each individual component (DQF A, TIAF A, VPG) and confirm the synergistic effect of combining global and instance-level feature alignment strategies.


Edge-Cloud Routing for Text-to-Image Model with Token-Level Multi-Metric Prediction

Xin, Zewei, Li, Qinya, Niu, Chaoyue, Wu, Fan

arXiv.org Artificial Intelligence

Large text-to-image models demonstrate impressive generation capabilities; however, their substantial size necessitates expensive cloud servers for deployment. Conversely, light-weight models can be deployed on edge devices at lower cost but often with inferior generation quality for complex user prompts. To strike a balance between performance and cost, we propose a routing framework, called \texttt{RouteT2I}, which dynamically selects either the large cloud model or the light-weight edge model for each user prompt. Since generated image quality is challenging to measure directly, \texttt{RouteT2I} establishes multi-dimensional quality metrics, particularly, by evaluating the similarity between the generated images and both positive and negative texts that describe each specific quality metric. \texttt{RouteT2I} then predicts the expected quality of the generated images by identifying key tokens in the prompt and comparing their impact on the quality. \texttt{RouteT2I} further introduces the Pareto relative superiority to compare the multi-metric quality of the generated images. Based on this comparison and predefined cost constraints, \texttt{RouteT2I} allocates prompts to either the edge or the cloud. Evaluation reveals that \texttt{RouteT2I} significantly reduces the number of requesting large cloud model while maintaining high-quality image generation.


End-Cloud Collaboration Framework for Advanced AI Customer Service in E-commerce

Teng, Liangyu, Liu, Yang, Liu, Jing, Song, Liang

arXiv.org Artificial Intelligence

In recent years, the e-commerce industry has seen a rapid increase in the demand for advanced AI-driven customer service solutions. Traditional cloud-based models face limitations in terms of latency, personalized services, and privacy concerns. Furthermore, end devices often lack the computational resources to deploy large AI models effectively. In this paper, we propose an innovative End-Cloud Collaboration (ECC) framework for advanced AI customer service in e-commerce. This framework integrates the advantages of large cloud models and mid/small-sized end models by deeply exploring the generalization potential of cloud models and effectively utilizing the computing power resources of terminal chips, alleviating the strain on computing resources to some extent. Specifically, the large cloud model acts as a teacher, guiding and promoting the learning of the end model, which significantly reduces the end model's reliance on large-scale, high-quality data and thereby addresses the data bottleneck in traditional end model training, offering a new paradigm for the rapid deployment of industry applications. Additionally, we introduce an online evolutive learning strategy that enables the end model to continuously iterate and upgrade based on guidance from the cloud model and real-time user feedback. This strategy ensures that the model can flexibly adapt to the rapid changes in application scenarios while avoiding the uploading of sensitive information by performing local fine-tuning, achieving the dual goals of privacy protection and personalized service. %We make systematic contributions to the customized model fine-tuning methods in the e-commerce domain. To conclude, we implement in-depth corpus collection (e.g., data organization, cleaning, and preprocessing) and train an ECC-based industry-specific model for e-commerce customer service.


Robust Load Prediction of Power Network Clusters Based on Cloud-Model-Improved Transformer

Jiang, Cheng, Lu, Gang, Ma, Xue, Wu, Di

arXiv.org Artificial Intelligence

Load data from power network clusters indicates economic development in each area, crucial for predicting regional trends and guiding power enterprise decisions. The Transformer model, a leading method for load prediction, faces challenges modeling historical data due to variables like weather, events, festivals, and data volatility. To tackle this, the cloud model's fuzzy feature is utilized to manage uncertainties effectively. Presenting an innovative approach, the Cloud Model Improved Transformer (CMIT) method integrates the Transformer model with the cloud model utilizing the particle swarm optimization algorithm, with the aim of achieving robust and precise power load predictions. Through comparative experiments conducted on 31 real datasets within a power network cluster, it is demonstrated that CMIT significantly surpasses the Transformer model in terms of prediction accuracy, thereby highlighting its effectiveness in enhancing forecasting capabilities within the power network cluster sector.


Text Sentiment Analysis and Classification Based on Bidirectional Gated Recurrent Units (GRUs) Model

Xu, Wei, Chen, Jianlong, Ding, Zhicheng, Wang, Jinyin

arXiv.org Artificial Intelligence

This paper explores the importance of text sentiment analysis and classification in the field of natural language processing, and proposes a new approach to sentiment analysis and classification based on the bidirectional gated recurrent units (GRUs) model. The study firstly analyses the word cloud model of the text with six sentiment labels, and then carries out data preprocessing, including the steps of removing special symbols, punctuation marks, numbers, stop words and non-alphabetic parts. Subsequently, the data set is divided into training set and test set, and through model training and testing, it is found that the accuracy of the validation set is increased from 85% to 93% with training, which is an increase of 8%; at the same time, the loss value of the validation set decreases from 0.7 to 0.1 and tends to be stable, and the model is gradually close to the actual value, which can effectively classify the text emotions. The confusion matrix shows that the accuracy of the model on the test set reaches 94.8%, the precision is 95.9%, the recall is 99.1%, and the F1 score is 97.4%, which proves that the model has good generalisation ability and classification effect. Overall, the study demonstrated an effective method for text sentiment analysis and classification with satisfactory results.


Integrating ChatGPT into Secure Hospital Networks: A Case Study on Improving Radiology Report Analysis

Kim, Kyungsu, Park, Junhyun, Langarica, Saul, Alkhadrawi, Adham Mahmoud, Do, Synho

arXiv.org Artificial Intelligence

The research explores the integration of artificial intelligence (AI), specifically large language models (LLMs) like ChatGPT into radiology within hospitals with an emphasis on maintaining security during implementation. Despite the proven effectiveness of these AI tools in processing radiological reports [1, 2, 3], their integration into hospital environments poses challenges due to the sensitive nature of patient data and the need for data confidentiality [4]. The direct use of cloud-based LLMs like ChatGPT is limited by data security concerns, especially when considering healthcare regulations such as HIPAA [5] and GDPR [6]. Our study addresses this by adapting these LLMs for secure, internal use within hospital radiology departments, transforming them into closed-network systems to comply with healthcare privacy standards. This approach aims to leverage the advanced capabilities of LLMs while safeguarding patient data privacy. This paper delves into how radiology reports can be automatically classified as normal or abnormal using cloud-based/high-performing LLMs like ChatGPT, with the goal of adapting these models for secure and internal use within hospital networks. This approach aims to enhance hospital workflows by streamlining the analysis of radiology findings, potentially leading to more efficient and accurate medical diagnostics and patient care management. This investigation is important for enhancing the practical utility of AI in radiology, ensuring both technological advancement and adherence to the paramount principle of patient confidentiality.